Accuracy of heart failure ascertainment using routinely collected healthcare data: a systematic review and meta-analysis

Background Ascertainment of heart failure (HF) hospitalizations in cardiovascular trials is costly and complex, involving processes that could be streamlined by using routinely collected healthcare data (RCD). The utility of coded RCD for HF outcome ascertainment in randomized trials requires assessment. We systematically reviewed studies assessing RCD-based HF outcome ascertainment against “gold standard” (GS) methods to study the feasibility of using such methods in clinical trials. Methods Studies assessing International Classification of Disease (ICD) coded RCD-based HF outcome ascertainment against GS methods and reporting at least one agreement statistic were identified by searching MEDLINE and Embase from inception to May 2021. Data on study characteristics, details of RCD and GS data sources and definitions, and test statistics were reviewed. Summary sensitivities and specificities for studies ascertaining acute and prevalent HF were estimated using a bivariate random effects meta-analysis. Heterogeneity was evaluated using I2 statistics and hierarchical summary receiver operating characteristic (HSROC) curves. Results A total of 58 studies of 48,643 GS-adjudicated HF events were included in this review. Strategies used to improve case identification included the use of broader coding definitions, combining multiple data sources, and using machine learning algorithms to search free text data, but these methods were not always successful and at times reduced specificity in individual studies. Meta-analysis of 17 acute HF studies showed that RCD algorithms have high specificity (96.2%, 95% confidence interval [CI] 91.5–98.3), but lacked sensitivity (63.5%, 95% CI 51.3–74.1) with similar results for 21 prevalent HF studies. There was considerable heterogeneity between studies. Conclusions RCD can correctly identify HF outcomes but may miss approximately one-third of events. Methods used to improve case identification should also focus on minimizing false positives. Supplementary Information The online version contains supplementary material available at 10.1186/s13643-024-02477-5.


Introduction
Heart failure (HF) is an important cause of morbidity and mortality in the general population affecting 1-3% of adults, with over 64 million people estimated to be affected worldwide [1][2][3].It is a significant burden on healthcare systems, accounting for about 2% of all healthcare expenditure in countries across Europe and the USA [1,2].Therefore, HF is an important target for treatment, requiring large randomized, controlled trials to assess potential interventions.Such large trials can be complex and costly [4,5].Ascertainment of HF admissions in a clinical trial often requires clinic visits (with or without manual medical records review) to identify potential events, gathering clinical documents for reported events, and independent clinical adjudication to confirm or refute events.This process could be streamlined to reduce the complexity and overall cost of trials [6][7][8].Routinely collected healthcare data (RCD) may help to achieve this goal by supporting the ascertainment of HF outcomes during within-trial periods, and post-trial assessments of the impact on longer-term HF risk [9].
RCD is defined as "healthcare data collected for purposes other than research or without specific a priori research questions developed before collection" [10].When patients are diagnosed with HF during a healthcare encounter, this diagnosis, along with other data relating to the encounter, are recorded in RCD, usually in the form of coded diagnoses.The most common RCD source is hospital administrative claims data (ACD), an umbrella term for data generated as part of the financial administration of hospitals [11,12].Other RCD sources include patient or disease registries and epidemiological surveys (detailed definitions of RCD sources used are provided in Additional file 1: Supplemental Methods).RCD can be used to ascertain events by searching the data for specific codes or coding algorithms.
Ascertaining hospitalizations for HF from such sources can be problematic as HF is a chronic disease with episodes of decompensation requiring admission, and commonly used coding systems do not distinguish between acute events and prevalent chronic disease.
A meta-analysis published in 2014 of 11 studies reporting sensitivity and specificity of coded administrative data for ascertaining HF, showed that pooled sensitivity was 75% (95% confidence interval [CI] 74.7-75.9)and pooled specificity was 97% (95% CI 96.8-96.9)[13].These findings mirrored two previous reviews [14,15].However, there was a limited number of studies in this review, and some studies had very small numbers of HF events.It is also possible that coding practices have improved over the last decade.A systematic review from 2020, focused entirely on Europe and including 20 studies using electronic health records and primary care data, reported sensitivities ≤ 66% and specificities ≥ 95% in most of the studies [16].However, it excluded other data sources such as claims databases and registries and was geographically restricted.We have systematically reviewed all studies that assessed the utility of RCD for HF outcome ascertainment to summarise the currently available evidence supporting their use in cardiovascular outcomes trials.

Methods
This review follows the PRISMA (Preferred Reporting for Systematic Reviews and Meta-Analyses) guidelines for conducting and reporting a systematic review [17].

Search strategy
A search was conducted of all available peer-reviewed literature on MEDLINE and Embase, from their inception (1946 and 1974 respectively), until May 2021 using the Ovid search engine.The initial search strategy was broad and aimed to include any studies where RCD was used to ascertain HF.No limits were set for the initial search.Multiple search terms, including different phrasings or synonyms of the same term were used (see Systematic Review Protocol in the Supplementary Appendix for search strategy and inclusion criteria).After removing duplicates, the titles and abstracts of potentially eligible articles were reviewed and those meeting the inclusion criteria underwent full-text review.The references of the full-text papers were hand-searched for additional relevant articles.

Inclusion and exclusion criteria
To be included in the review, a study was required to assess the utility of coded RCD for ascertainment of HF against gold standard (GS) ascertainment criteria.We selected full-length, peer-reviewed articles published in English that used RCD to ascertain HF events and reported at least one agreement statistic, or sufficient data to allow its calculation, for International Classification of Disease (ICD) code-based definitions of HF.All studies included must have defined a GS against which to assess the RCD-based ascertainment method and include at least 50 HF events identified using the GS method relevant to that study.The GS method is defined as the reference standard against which each study assessed their RCD-based outcome ascertainment method.Examples include medical records review using pre-specified criteria.Articles were excluded if they used free-text electronic medical records (i.e., narrative clinical notes) as the sole RCD source as these would be considered medical records and are often used as the GS for event adjudication (see Systematic Review Protocol in Supplementary Appendix for detailed exclusion criteria).

Data extraction
The full-text articles were reviewed by the first author (MAG) who abstracted the data into a data collection form.The author extracted study characteristics, details of the data sources (RCD and GS), type of hospital encounter (e.g., inpatient, outpatient, or emergency department attendances), and data definitions used, along with agreement statistics for the ICD code or coding algorithm used to ascertain HF.The agreement statistics extracted included sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and kappa scores.Where agreement statistics were unavailable, raw data was extracted for calculation where possible.Most routine databases list the main reason for hospitalization (most responsible diagnosis) in a primary position and secondary complications or pre-existing comorbidities in secondary positions.As the distinction between these categories is likely to be important in ascertaining incident episodes of heart failure (e.g., hospitalization due to HF decompensation) as potential trial outcomes, the coding positions and agreement statistics according to coding position were also abstracted where available.If a study used more than one RCD definition or algorithm, the algorithm with the best agreement statistics was used for the main analysis.
Studies were categorized according to which types of RCD-based and GS HF events were included.Studies that only included hospitalizations for decompensated HF, irrespective of a prior HF diagnosis, were categorized as acute HF studies.These studies were the main focus of the analysis as such methods could be used to collect follow-up information in a clinical trial.Studies that included all individuals with HF recorded over the study period (new and pre-existing HF) were categorized as prevalent HF studies.Such methods could be used to identify potential participants for inclusion in clinical trials.Studies that defined HF as a comorbid disease in individuals admitted with another main diagnosis such as myocardial infarction were also included in the prevalent HF category.
If a study assessed both acute and prevalent HF, or different ICD versions, or more than one coding position separately, the agreement statistics were extracted for all relevant event types or RCD algorithms for subgroup analysis.The first author conducted a second review of the abstracted data comparing them against the original abstract to correct any discrepancies in the data collection form.Any uncertainties were resolved through discussion with two senior clinicians (MMM and RJH).

Study quality assessment
A quality assessment of the included studies was undertaken using the revised tool for Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) [18].Three authors (WK, ME, and AEM) independently reviewed the studies and extracted data using the QUADAS-2 template, and the first author reviewed and collated the final quality assessment.Studies were classified as having a low, high, or unclear risk of bias for 4 domains (patient selection, index test, reference standard, and flow and timing) and the first 3 domains were also assessed for applicability to the review question (see Supplemental Methods in Additional file 1 for details).Studies were considered to have a "low risk" of bias or "low concern" regarding applicability if all domains were low risk.If one or more domains had unclear or high risk the study was considered to be "at risk" of bias or have "some concerns" regarding applicability.A sensitivity analysis excluding studies at risk of bias was undertaken.

Statistical analysis
Studies were grouped according to whether they assessed acute or prevalent HF.Other potential sources of heterogeneity included coding system, position and definitions used, RCD and GS data source, study size, publication date, and country or region (e.g., Europe).All agreement statistics (sensitivity, specificity, PPV or NPV) and 95% CI (exact binomial CI) were calculated using available data (see Additional file 1: Figure S1 for an example 2 × 2 table) [19].Summary sensitivity, specificity, and a summary receiver operating characteristic (SROC) plot with a summary curve (using the hierarchical SROC model) were obtained using the Stata command metandi [20].As these are random effects models that may give undue weight to smaller studies, an additional sensitivity analysis was undertaken limited to studies with > 200 GS events.
The I 2 statistic was used to assess heterogeneity between the sensitivity and specificity estimates in addition to visual inspection of the HSROC curves [21].All analyses were performed using Stata version 17.
Formal testing for publication bias was undertaken by a regression of the log diagnostic odds ratio against 1/√effective sample size (ESS), weighted by ESS, with a P < 0.05 for the slope coefficient indicating significant asymmetry [22] (see Additional file 1, Supplemental Methods, Statistical Methods and Interpretation for details).

Study selection
The initial Embase and MEDLINE searches yielded 2790 articles in total and an additional 56 records were identified through a manual search of references during full-text review.After the removal of duplicates and non-English language articles and abstract review, 129 articles were selected for full-text review.Of these, 71 were excluded and 58 articles were included in the final synthesis (Fig. 1).

Study quality assessment
The overall risk of bias was low for 28 (48%) studies (Additional file 1: Table S3).Of the remaining 30 studies, 7 had at least one high-risk domain and 23 had one or more domains with unclear risk of bias.Of 7 studies with high-risk domains, 6 had a reference standard at risk of not correctly classifying the target condition [28,57,68,70,71,79] while, in one study, patients were inappropriately excluded from the analysis as they did not receive the reference standard [57].Concerns regarding applicability were low for 42 studies (72%).Fourteen of the 16 studies with "some concern" regarding applicability were also considered "at risk" for overall risk of bias, with concerns about the reference standard being the most common issue in both areas.

Gold standard data sources and definition
Forty-nine (85%) studies used hospital medical records as the GS data source (Additional file 1: Table S4 summarizes the sources of routine and GS data).The remaining studies used primary care records (2 studies) [49,76], and specialty databases or registries containing coded clinical data (5 studies) [12,24,35,42,57].One study assessed outcomes against participant self-report [71], and another study conducted prospective medical assessments and echocardiography [37].
Most studies (85%) undertook a further adjudication step of the GS source data conducting clinical adjudication of the medical records according to study defined Fig. 1 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flowchart summarising the study selection process.Legend: EMR indicates electronic medical records; GS, gold standard; HF, heart failure; n, number of records and RCD, routinely collected healthcare data or guideline criteria.Three studies used the recoding of medical records by professional coders as the GS source [28,68,79] while the remaining six studies did not undertake any adjudication (Additional file 1: Table S5 summarizes the GS ascertainment methods used, and Table S6 the main guideline criteria used for GS adjudication).
The coding algorithms used varied considerably between studies.Four (7%) studies did not define the specific coding algorithm used [25,29,58,62].The commonest ICD-9 code used was 428.x (heart failure) alone (17 studies) or in combination with other codes (20 studies).The commonest ICD-10 code used was I50.x (heart failure) alone (9 studies) or in combination with others (15 studies).Additional file 1: Tables S7 and S8 summarize the ICD-9 and -10 coding algorithms used respectively, while Additional file 1: Table S9 includes a list of all the HF codes used in the studies along with their definitions.
Most studies specified the ICD HF code position (primary, secondary, any) within the database.Among 37 studies ascertaining acute HF, 4 (11%) studies reported algorithms with HF codes in the primary position and any position separately [28,30,44,56], 11 (30%) only reported algorithms with HF codes in the primary position, and 21 (57%) only reported algorithms with codes in any position.One study algorithm (2%) used codes in positions 1-6 [36].

Results of individual studies
Table 1 summarizes the agreement statistics of the main study algorithm(s) for each study considering acute HF grouped by country (as RCD sources are likely to be similar) and ordered by sensitivity or PPV (highest to lowest).There was a wide range of performance across studies with sensitivities ranging from as low as 13% to > 90%.Only 8/23 (35%) studies reported a sensitivity > 80%.Although specificity also ranged widely between 20 and > 90%, 17/21 (81%) studies reported a specificity > 80%.

Meta-analysis
Sufficient data for meta-analysis was available for 17,986 GS HF events from 17/37 studies assessing RCD for acute HF.The funnel plot for publication bias with the superimposed regression line is shown in Additional file 1: Figure S2.The p value for the slope coefficient was not statistically significant (P value = 0.73) indicating a symmetrical funnel plot and a low likelihood of publication bias.
Table 2 provides the summary statistics for acute and prevalent RCD algorithms overall and according to the diagnostic position of HF codes.The summary sensitivity and specificity for acute HF studies were 63.5% (95% CI 51.3-74.1)and 96.2% (95% CI 91.5-98.3)respectively (Table 2).The agreement was similar in studies which included codes in the primary diagnostic position and any diagnostic position.When the analysis was restricted to 14 studies (17,540 GS HF events in total) with > 200 GS HF events the summary sensitivity was lower while specificity remained unchanged (Table 2 and Additional file 1: Figure S3a).When the analysis was restricted to 9 studies at low risk of bias, summary sensitivity was lower while specificity was similar (Table 2).
Figure 2 shows the forest plot of paired sensitivities and specificities for acute HF studies.There was marked heterogeneity between studies ascertaining acute HF (I 2 99.3% and 99.7% for sensitivity and specificity respectively).The SROC plot for acute HF (Fig. 3a) has a wide 95% prediction region with individual study algorithms scattered away from the HSROC curve also suggesting considerable heterogeneity between studies, with no clear relationship between sensitivity and specificity.Heterogeneity remained regardless of the coding position used (Additional file 1: Figure S4).

Subgroup analysis
Given the significant heterogeneity between studies, Additional file 1: Table S10 summarises agreement statistics for studies ascertaining acute HF according to other subgroups of interest that are potential sources of     There was considerable heterogeneity with I 2 ≥ 98% within all subgroups (Additional file 1: Table S10).Some of these subgroups only included a small number of studies and the summary results should be interpreted with caution.

Results of individual studies
Table 3 summarizes the agreement statistics of the main study algorithm(s) for each study ascertaining prevalent HF grouped by country and ordered by sensitivity or PPV (highest to lowest).
There was a wide range of performance across studies similar to acute HF studies, but a specificity ≥ 90% was reported by all 22 studies reporting specificities while only 27% reported a sensitivity ≥ 80%.

Meta-analysis
Twenty-one of 24 studies (including 19,840 GS HF events) ascertaining prevalent HF provided sufficient data for meta-analysis.Statistical testing for publication bias showed no significant asymmetry (P value = 0.57) indicating a low likelihood of publication bias (Additional file 1: Figure S2).The overall summary sensitivity and specificity were 63.7% (95% CI 55.3-71.3)and 98.1% (95% CI 97.0-98.8)respectively (Table 2).The result of restricting the analysis to 10 studies with > 200 GS events was similar to the impact on acute HF (Table 2 and Additional file 1: Figure S3b).Restricting the analysis to 8 studies at low risk of bias produced similar summary sensitivity and specificity to the overall result (Table 2).
Figure 4 shows the forest plot of paired sensitivities and specificities for prevalent HF studies.There was significant heterogeneity between studies similar to acute HF studies (Table 2, Fig. 3b, Additional file 1: Figure S5).

Discussion
RCD sources are becoming increasingly accessible to researchers and are an invaluable resource for costeffective, streamlined clinical research.The present study demonstrated that acute HF outcomes ascertained using RCD have good specificity (96%) but lack sensitivity (63%) with similar results for prevalent HF outcomes.This indicates that whilst RCD-based ascertainment is effective at correctly identifying people who have HF, it missed one-third of cases, suggesting that further improvements are required in HF outcome ascertainment methods.The wide confidence intervals around the summary estimate of sensitivity are compatible with RCD-based ascertainment methods missing between 45 and 19% of acute heart failure cases.Furthermore, there was significant heterogeneity between studies and within subgroups which is not explained by differences in RCD coding algorithms, the GS or the country of origin, study size, or year of publication, suggesting there may be other factors such as differences in the populations studied.Therefore, both the summary statistics and subgroup analysis must be interpreted with caution.
A previous review suggested that the use of broader parameters along with laboratory and prescription data may help identify more cases [13].However, this study has not been able to confirm this, as there were only a few studies using these sources.Eight studies used algorithms Fig. 2 Forest plot of paired sensitivities and specificities of study algorithms ascertaining acute heart failure.Legend: Algorithms sorted by diagnostic code position.Summary points estimated using a bivariate random effects model.CI indicates confidence intervals; FN, false negatives; FP, false positives; I 2 , I 2 statistic describing the percentage of variation across studies that is due to heterogeneity rather than chance; TN, true negatives and TP, true positives Fig. 3 SROC plots for the diagnostic accuracy of coding algorithms ascertaining acute and prevalent heart failure.Legend: a Acute heart failure (HF) algorithms and b Prevalent HF algorithms.HSROC indicates hierarchical summary receiver operating characteristic curve, grey circle, the sensitivity and (1-specificity) of an individual study with the size of the circle proportionate to study size; summary point, summary sensitivity, and specificity; 95% confidence region, 95% confidence region for the summary point, and the 95% prediction region, the area in which we can say with 95% certainty the true sensitivity and specificity of a future study will be contained combining different sources, coding combinations, periods of data identification etc. [23,31,33,39,57,59,76,77].However, the sensitivity in these studies was no different from other studies with simpler algorithms and RCD sources, indicating that the use of complex algorithms did not necessarily improve sensitivity [23,33,76].
Using multiple codes from the same source compared to I50x/428x alone (broad vs narrow algorithms) has also not led to a significant increase in sensitivity for acute HF studies (67.1% vs 70.7%) in this meta-analysis (Additional file 1: Table S10).However, this comparison is again between the results of different studies.One study of 99 GS events compared several narrow versus broad coding definitions and found no difference in diagnostic accuracy [76].Although using machine learning algorithms or keyword searches of free-text entries improved sensitivity this came at the cost of lower specificity in individual studies [72,73].

Characteristics of better performing algorithms
There were 5 studies with acute HF algorithms that performed above the estimated average with sensitivities > 75% while maintaining specificities > 90% [27,28,32,47,79].However, two of these used re-coded medical records as the GS to assess coding practices [28,79] and all of these studies were considered 'at risk' of bias.The use of recoded data may not be a true reflection of the actual presence or absence of disease and may explain the high concordance.In contrast, three studies using registry data as the GS source had worse sensitivities than average (Table 1).This suggests that differences in the GS may explain some of the variation between studies.The only commonalities of the remaining 3 high-performing studies were the use of ICD-9 coded inpatient HDD as the RCD source and adjudicated medical records as the GS.
Prevalent HF studies performed better with 12 studies demonstrating sensitivities > 75% while maintaining specificities > 96%.Five of these studies used RCD from Canadian hospital discharge abstract databases which are coded according to national standards [63,65,72,76,79].One of these combined HDD with physician billing data obtaining a sensitivity and specificity of 84.8% and 97.0% respectively (Table 3) [76].One Canadian study increased its sensitivity from 57.4% (95% CI 51.8-63.0)using an ICD-10 code search of HDD alone to 83.3% (95% CI 73.9-72.8%)by combining the code search with a machine learning algorithm of unstructured free-text entries while maintaining specificity [72].Similar results were obtained by a German study where combining an ICD-10 code search of HDD with a machine learning algorithm of unstructured free-text improved sensitivity from 49.5% (95% CI 42.8-56.3)to 83.8% (95% CI 78.3-88.4)Fig. 4 Forest plot of paired sensitivities and specificities of study algorithms ascertaining prevalent heart failure.Legend: Algorithms sorted by diagnostic code position.Summary points are estimated using a bivariate random effects model.CI indicates confidence intervals; FN, false negatives; FP, false positives; I 2 , I 2 statistic describing the percentage of variation across studies that is due to heterogeneity rather than chance; TN, true negatives and TP, true positive [73].The study with the highest sensitivity, specificity, and kappa scores was an Australian study which again used re-coded medical records as the GS, which may explain the high concordance [68].

Limitations of review
There are some limitations to this review.The availability of agreement statistics and information such as the coding algorithms used was variable and made direct comparison between all studies difficult.The quality of the available studies was variable with about half of studies assessed as 'at risk' of bias.However, restricting to studies with 'low risk' of bias resulted in similar summary estimates of sensitivity and specificity.
This meta-analysis utilizes the currently recommended bivariate and HSROC models which are random effects models that may give undue weight to smaller studies.However, the aim of the meta-analysis is not to present an exact summary but an overall estimate of the likely average sensitivity and specificity of using RCD for ascertainment of HF outcomes.The potential impact of using random-effects meta-analysis was assessed by doing an additional analysis limited to studies with > 200 GS events.
The comparisons between the different algorithms were limited as they were assessed in diverse study populations rather than within the same population, requiring cautious interpretation of the summary statistics and subgroup analysis.For example, a possible impact of the coding position was demonstrated in the meta-analysis results, with studies ascertaining acute HF in the primary position having better summary sensitivity and specificity than those using codes in any position (Table 2).However, four acute HF studies assessing the impact of coding position on diagnostic performance within each study all showed that using codes in the primary position reduces sensitivity and improves specificity compared to codes in any position (Table 1) [28,30,44,56].
This review was also restricted to English language articles and 24 abstract-only studies were excluded.This may have led to publication bias along with any studies that may have been withheld from publication due to poor validation statistics.However, there was no statistically significant publication bias detected.
The WHO ICD-8, -9, and -10 codes do not support separate coding of HF sub-types (e.g., HF with preserved ejection fraction).Although some studies did include additional codes from the ICD-CM codes (USA) and the ICD-CA codes (Canada), this review could only assess the ascertainment of acute HF and prevalent HF irrespective of subtype.The implementation of the new WHO ICD-11 codes, which include heart failure codes capturing preserved, mid-range, and reduced ejection fraction, may allow HF subtypes to be captured in the future [80].

Practical implications and future directions
When using acute HF outcomes to assess treatment effects in trials, a high false negative rate (low sensitivity) will have no impact on the point estimate of the overall treatment effect (provided the missing events are evenly distributed between the control arm and active arm), but it will reduce the statistical power of the trial and lead to widening of confidence intervals.In contrast, low specificity (high false positive rate) can lead to underestimation of treatment effects.Therefore, it is important to ensure that any steps taken to improve the sensitivity of HF algorithms have minimal impact on specificity.A logical way to achieve this may be to broaden the diagnostic codes used to capture HF (and/or combine more than one data source) as attempted by some studies and add a second method to maintain specificity such as a manual review of RCD records by clinicians to confirm or refute suspected events.This second method is less resource-intensive than GS adjudication of medical records and may improve diagnostic accuracy in a similar way to using machine learning algorithms on free text entries but has not been used in any of the studies reviewed [72,73].
Finally, the considerable variation in agreement statistics between studies may be related to differences in coding practices.Therefore, any new RCD source or ascertainment method is likely to require validation prior to use for HF outcome ascertainment.

Conclusions
While there is significant heterogeneity in studies assessing RCD-based HF outcome ascertainment, this study confirms that the presence of HF codes in RCD correctly identifies true HF but significantly underestimates events.Strategies used to improve case identification include the use of broader coding definitions, multiple data sources, and machine learning algorithms of free text data.However, these methods were not always successful and at times reduced specificity in individual studies.Therefore, methods used to improve case identification should also focus on minimizing false positives.

Table 1
Agreement statistics for the best ICD code-based algorithm(s) for acute heart failure studies

Table 2
Agreement statistics for coding algorithms ascertaining acute and prevalent heart failure according to coding position CI indicates confidence intervals, HF heart failure, I 2 I 2 statistic describing the percentage of variation across studies that is due to heterogeneity rather than chance, N number of study algorithms (the same study can contribute > 1 algorithm in the subgroups if > 1 diagnostic position used, or the same study assessed acute and prevalent HF)

Table 3
Agreement statistics for the best algorithm (s) assessing prevalent heart failure Studies from one country using the same RCD source are shown in bold lettering ACD indicates administrative claims data, Alg.algorithm, CI confidence interval, EMR electronic medical record, GS gold standard, HDD hospital discharge data, HF heart failure, ICD International Classification of Diseases, MLA machine learning algorithm, Pos.position, PPV positive predictive value, NPV negative predictive value, RCD routinely collected data